Underfitting vs. Overfitting
Underfitting
Common solutions
- include more features
- try a more complicated model
- for Recurrent Neural Networks and Convolutional Neural Networks:
- find better network architecture and hyperparameters
- try longer/better optimization algorithms (Gradient Descent)
Overfitting
Common solutions
- try a simpler model
- collect more data (or training examples) for ML
- Feature selection: select features to include/exclude, when there are multiple features
- Regularization => reduce overfitting during estimation
- for Recurrent Neural Networks and Convolutional Neural Networks: find better network architecture and hyperparameters
Estimates of out-of-sample accuracy => estimate the degree of overfitting
-
Cross-validation => test the model’s predictive accuracy on another sample
-
Information criteria => construct a theoretical estimate of the relative out-of-sample Kullback-Leibler Divergence (Information theory and Entropy in Neuroscience#^aa57f9)
- #Akaike Information Criterion AIC
- Deviance Information Criterion (DIC): a more general version of AIC
- Widely Applicable Information Criterion (WAIC): even more general than AIC and DIC
- WAIC PSIS
BUT! Do not use predictive criteria to choose a causal estimate, because predictive criteria actually prefer confounds in Causal inference#^b7b0a6.